User-generated content (UGC) is becoming a valuable organizational resource, as it is seen in many cases as a way to make more information available for analysis. To make effective use of UGC, it is necessary to understand information quality (IQ) in this setting. Traditional IQ research focuses on corporate data and views users as data consumers. However, as users with varying levels of expertise contribute information in an open setting, current conceptualizations of IQ break down. In particular, the practice of modeling information requirements in terms of fixed classes, such as an Entity-Relationship diagram or relational database tables, unnecessarily restricts the IQ of user-generated data sets. This paper defines crowd information quality (crowd IQ), empirically examines implications of class-based modeling approaches for crowd IQ, and offers a path for improving crowd IQ using instance-and-attribute based modeling. To evaluate the impact of modeling decisions on IQ, we conducted three experiments. Results demonstrate that information accuracy depends on the classes used to model domains, with participants providing more accurate information when classifying phenomena at a more general level. In addition, we found greater overall accuracy when participants could provide free-form data compared to a condition in which they selected from constrained choices. We further demonstrate that, relative to attribute-based data collection, information loss occurs when class-based models are used. Our findings have significant implications for information quality, information modeling, and UGC research and practice.
Reusing database queries by adapting them to satisfy new information requests is an attractive strategy for extracting information from databases without involving database specialists. However, the reuse of information systems artifacts has been shown to be susceptible to the phenomenon of anchoring and adjustment. Anchoring often leads to a systematic adjustment bias in which people fail to make sufficient changes to an anchor in response to the needs of a new task. In a study involving 157 novice query writers from six universities, we examined the effect of this phenomenon on the reuse of Structured Query Language (SQL) queries under varying levels of domain familiarity and for different types of anchors. Participants developed SQL queries to respond to four information requests in a familiar domain and four information requests in an unfamiliar domain. For two information requests in each domain, participants were also provided with sample queries (anchors) that answered similar information requests. We found evidence that the opportunity to reuse sample queries resulted in an adjustment bias leading to poorer quality query results and greater overconfidence in the correctness of results. The results also indicate that the strength of the adjustment bias depends on a combination of domain familiarity and type of anchor. This study demonstrates that anchoring and adjustment during query reuse can lead to queries that are less accurate than those written from scratch. We also extend the concept of anchoring and adjustment by distinguishing between surface-structure and deep-structure anchors and by considering the impact of domain familiarity on the adjustment bias.
Organizing phenomena into classes is a pervasive human activity. The ability to classify phenomena encountered in daily life in useful ways is essential to human survival and adaptation. Not surprisingly, then, classification-oriented activities are widespread in the information systems field. Classes or entity types play a central role in conceptual modeling for information systems requirements analysis, as well as in the design of databases and object-oriented software. Furthermore, classification is the primary task in applications such as data mining and the development of domain ontologies to support information sharing in semantic web applications. However, despite the pervasiveness of classification, little research has proposed well-grounded guidelines for identifying, evaluating, and choosing classes when modeling a domain or designing information systems artifacts. In this paper, we adopt the cognitive notions of inference and economy to derive a set of principles to guide effective and efficient classification. We present a model for characterizing what may be considered useful classes in a given context based on the inferences that can be drawn from membership in a class. This foundation is then used to suggest practical design rules for evaluating and refining potential classes. We illustrate the use of the rules by showing that applying them to a previously published example yields meaningful changes. We then present an evaluation by a panel of experts who compared the published and revised models. The evaluation shows that following the rules leads to semantically clearer models that are preferred by experts. The paper concludes by outlining possible future research directions.
Much research in conceptual data modeling has focused on developing techniques for view integration, or combining local conceptual schemas into a global schema. Local schemas are argued to be important in verifying conceptual data requirements before proceeding to database design. View integration is claimed to fulfill two purposes: First, a global conceptual schema is a prerequisite to logical design and implementation. Second, global schemas are thought to be useful in improving organizational communication among diverse user groups with different perspectives and information needs. However, performing view integration is difficult. Moreover, there is no empirical evidence that global schemas either impede local verification or support communication. Drawing on classification research, this paper develops and tests claims about the impact of schema structure (local versus global) on verification and communication. Local schemas are hypothesized to better support verification than global schemas. When different local views contain conflicting structure, local schemas are expected to be superior in supporting communication. However, when local views contain complementary structure, global schemas are expected to be superior in supporting communication. A laboratory experiment was conducted to test these predictions. The results support the hypotheses. Implications for the practice of database design and for further research are considered.